Machine learning for the prediction of acute kidney injury in patients with sepsis

Background Acute kidney injury (AKI) is the most common and serious complication of sepsis, accompanied by high mortality and disease burden. The early prediction of AKI is critical for timely intervention and ultimately improves prognosis. This study aims to establish and validate predictive models based on novel machine learning (ML) algorithms for AKI in critically ill patients with sepsis. Methods Data of patients with sepsis were extracted from the Medical Information Mart for Intensive Care III (MIMIC- III) database. Feature selection was performed using a Boruta algorithm. ML algorithms such as logistic regression (LR), k-nearest neighbors (KNN), support vector machine (SVM), decision tree, random forest, Extreme Gradient Boosting (XGBoost), and artificial neural network (ANN) were applied for model construction by utilizing tenfold cross-validation. The performances of these models were assessed in terms of discrimination, calibration, and clinical application. Moreover, the discrimination of ML-based models was compared with those of Sequential Organ Failure Assessment (SOFA) and the customized Simplified Acute Physiology Score (SAPS) II model. Results A total of 3176 critically ill patients with sepsis were included for analysis, of which 2397 cases (75.5%) developed AKI during hospitalization. A total of 36 variables were selected for model construction. The models of LR, KNN, SVM, decision tree, random forest, ANN, XGBoost, SOFA and SAPS II score were established and obtained area under the receiver operating characteristic curves of 0.7365, 0.6637, 0.7353, 0.7492, 0.7787, 0.7547, 0.821, 0.6457 and 0.7015, respectively. The XGBoost model had the best predictive performance in terms of discrimination, calibration, and clinical application among all models. Conclusion The ML models can be reliable tools for predicting AKI in septic patients. The XGBoost model has the best predictive performance, which can be used to assist clinicians in identifying high-risk patients and implementing early interventions to reduce mortality. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03364-0.


Introduction
Acute kidney injury (AKI) is a common and complex clinical complication in intensive care unit (ICU) settings [1]. In the ICU, approximately 53% of AKI is caused by sepsis and subsequently contributes to longer hospital stay, higher morbidity, and heavier financial burden to patients [2,3]. Despite improvements in clinical treatment, the mortality of AKI remains unchanged and reaches as high as 40-44% in patients with sepsis due to multiorgan failure, microvascular dysfunction, and systemic inflammatory response syndrome [4][5][6]. However, AKI can be reversed at the early stage through timely intervention and effective treatment, thereby reducing AKI-related mortality [7]. Therefore, identifying patients with high risk of AKI is of vital importance for the management of patients with sepsis in ICU settings.
The prediction of AKI in patients with sepsis has always been a hot topic in critical care medicine. Some biomarkers, such as microRNA-22-3p [8], neutrophil gelatinase-associated lipocalin [9], procalcitonin [10], urinary miR-26b [11], and soluble thrombomodulin [12], have been reported to be associated with AKI in sepsis. However, they are difficult to popularize in clinical settings due to the high cost and requirement of testing technology. Some scoring systems, including acute physiology and chronic health evaluation-II, the simplified acute physiology score (SAPS) II, and sequential organ failure assessment (SOFA), have also been used in AKI prediction, but their performances are unsatisfactory due to poor specificity and sensitivity [13,14]. In addition, some multivariate predictive models based on traditional statistical methods, such as logistic regression (LR) and Cox proportional risk model, have been developed for predicting the development of AKI among patients with sepsis. Fan et al. [15] applied LR to develop a prediction model for AKI in 15,726 patients with sepsis, and the model showed a preferable predictive accuracy. Importantly, the relationship between variables is complex, including linear or nonlinear relationship, which is prominent in ICU settings. However, LR is defaulted to handle the linear relationship between independent and dependent variables, and may oversimplify the complex nonlinear relationship. Moreover, LR is prone to be affected by multicollinearity between variables, which may reduce the performance of the model. Therefore, exploring more effective and accurate prediction tools is extremely important in the management of septic patients.
Recently, machine learning (ML) has attracted the attention of and gained recognition from clinicians due to the evolution of statistical theory and computer technology. Novel ML techniques have been widely used in predictive models of various diseases and show better performance compared with those of traditional LR or Cox regression analyses [16,17]. We can find quite a few efforts on the application of ML algorithms for AKI prediction. For example, Chiofolo et al. [18] developed an AKI prediction model using automatic continuous random forest algorithm in critically ill patients, and achieved a preferable capability for early identification of high-risk patients. Le et al. [19] formulated a prediction system for AKI in the ICU settings using convolutional neural networks (CNNs), and found that the predictive performance of the CNN model outperformed that of SOFA scoring system. Lin et al. [20] found that random forest had greater potential in predicting mortality in patients with AKI rather than support vector machine (SVM), artificial neural network (ANN), and SAPS II. However, evidence showing the advantage of the ML algorithms in the prediction of AKI in septic patients is still lacking. In this study, we aimed to develop and validate multiple ML models to predict AKI in septic patients and to find the model with the best predictive performance.

Data source
Using the Structured Query Language, data of patients with sepsis were extracted from a single-center publicly available database called the Medical Information Mart for Intensive Care III (MIMIC-III) database [21]. The MIMIC-III database is an integrated, de-identified, comprehensive clinical dataset containing all patients admitted to the ICUs of Beth Israel Deaconess Medical Center in Boston, MA, from June 1, 2001, to October 31, 2012. The MIMIC-III includes detailed information about admitted and discharged patients, such as demographic characteristics, monitoring vital signs, laboratory and microbiological examination, imaging examination, observation and recording of intake and output, drug treatment, length of stay, survival data, and discharge or death records. To apply for access to the database, we passed the protection of human research participants examination and obtained the certificate (No. 9983480).

Participants
When patients were diagnosed with sepsis using the International Classification of Disease 9th revision (ICD-9) (99591, 99592, 78552) after first ICU admission, the patient eligibility was considered. Then, the Kidney Disease: Improving Global Outcomes (KDIGO) criteria [22] were used to determine whether AKI occurred in patients with sepsis during hospitalization. Patients who left the ICU within 48 h, aged < 18 years old and > 89 years old, or previously had AKI or renal failure were excluded. Moreover, patients with missing > 20% individual data or receiving renal replacement therapy (RRT) or continuous RRT at admission were excluded.

Data extraction
Patient data in the initial 24 h following admission were retrieved from the MIMIC-III database. The following information was used in this study: (1) demographic features, including sex, age, and ethnicity; (2) comorbidities, including congestive heart failure, hypertension, chronic pulmonary, diabetes, and liver disease; (3) vital signs, including heart rate, temperature, oxygen saturation (SpO 2 ), systolic blood pressure (SysBP), and diastolic blood pressure (DiasBP); (4) laboratory parameters, including total bilirubin, anion gap, albumin, chloride, potassium, sodium, lactate, partial thromboplastin time (PTT), prothrombin time (PT), international normalized ratio (INR), creatinine, blood urea nitrogen (BUN), and glucose; (5) therapeutic and clinical managements, including mechanical ventilation and vasopressor use. For some variables with multiple measurements, we included the maximum and minimum values for analysis. For SOFA and SAPS-II scores, we only included the initial test values for analysis. Because this was an epidemiological study based on hypothesis, no attempt was made to estimate the sample size of the study. Instead, all eligible patients in the MIMIC-III database were enrolled to achieve a maximized statistical power.
In order to minimize the bias resulting from missing data, variables with over 20% missing values were excluded in the final cohort, and other variables were duplicated using multiple imputation (MI) method. MI is an excellent and widely used method in dealing with missing values [23]. MI can impute each missing value with multiple plausible possible values. This procedure takes into account uncertainty behind the missing value and can produce several datasets from which parameters of interest can be estimated [24]. For example, if you are interested in coefficient for a covariate in a multivariable model, the coefficients will be estimated from each dataset, resulting in multiple coefficients. Considering the uncertainty in the estimation of missing values, these coefficients are combined to give a valid estimate of the coefficient. The coefficient variance estimated by MI is less likely to be underestimated than that estimated by single imputation [25].

Statistical analysis
Continuous variables were summarized as the median with interquartile range and were compared using the Wilcoxon rank-sum test. Categorial variables were expressed as number and percentage and were compared using the Chi-square tests or Fisher's exact probability method.
Feature selection is an important step in model construction. The Boruta algorithm was used to identify the most important features by comparing the Z-value of each feature against that of "shadow features". By duplicating all real features and shuffling them sequentially, the Z-value of each attribute is obtained from a random forest model in each iteration, and the Z-value of shadow is created by random shuffling of the real features. A real feature is regarded as "important" if its Z-value is greater than the maximal Z-value of shadow features in multiple independent trials [26].
After feature selection, seven ML algorithms, including LR, k-nearest neighbors (KNN), SVM, decision tree, random forest, extreme gradient boosting (XGBoost), and ANN, were employed for model construction. A tenfold cross-validation was applied for the training and validation sets to prevent overfitting, and it was also used to formulate predictive models. Accordingly, the whole dataset was randomly divided into 10 folds. Nine of them were used as the training set for model development, and the remaining one was used as the validation set for model validation. Because each of the 10 folds was used as the validation set, the above process was repeated 10 times. Finally, the performance of each model was validated and compared in the validation set. In our cases, the model with the highest area under curve (AUC) of the receiver operating characteristic (ROC) curve was selected as the optimal model of each algorithm. Because SOFA and SAPS II scores were used as common tools for predicting the illness severity and prognosis in critically ill patients, we also compared the predictive abilities of ML-based predictive models with those of the conventional scoring systems.
The performance of the predictive models was performed with respect to discrimination, calibration, and clinical utility. The discrimination was quantitatively evaluated by the AUC of the ROC curve, sensitivity, specificity, recall, accuracy, and F1 score. The calibration was visually assessed through the graphical representations of the consistency of the predictive probabilities and the observed outcomes based on 1000 bootstrap resamples. The clinical application was investigated by decision curve analysis (DCA). The statistical analyses and modeling process were conducted by using R version 4.0.5, and a two-sided P-value < 0.05 was regarded as statistically significant.

Baseline characteristics
A total of 6138 patients were diagnosed with sepsis at admission according to ICD-9. Moreover, 2961 patients were excluded according to the exclusion criteria ( Fig. 1). Finally, a total of 3176 patients were included in our analysis, of which 2397 patients (75.5%) had AKI after ICU admission.
The differences in characteristics between AKI and non-AKI groups are described in Table 1. Male patients were more likely to develop AKI than female patients during hospitalization. Patients who suffered from AKI had higher age and BMI; higher incidence of congestive heart failure, cardiac arrhythmias, hypertension, liver disease, paralysis, chronic pulmonary disease, diabetes, and coagulopathy; and higher rate of mechanical However, the levels of urine output and eGFR in the AKI group were lower than those in the non-AKI group (P < 0.05).

Feature selection
The result of feature screening based on the Boruta algorithm is shown in Fig. 2

Model performance comparisons
We generated seven ML models and two scoring systems to predict the development of AKI in patients with sepsis after ICU admission. Figure 3 shows the discriminative performance of nine models in terms of ROC curves. Among the nine models, XGBoost model (AUC = 0.817) had the best predictive effect for AKI in septic patients, followed by random forest (AUC = 0.  Table 2 presents a set of detailed performance metrics for the nine models. The XGBoost model had the best discrimination with the highest sensitivity (0.945), accuracy (0.832), recall (0.852), F1 score (0.895), and the third highest specificity (0.913). In Additional file 1: Figure S1, the calibration curves showed that the XGBoost model performed best among the seven ML models. According to the DCA curves (Fig. 4), the XGBoost model exhibited greater net benefit along with the threshold probability compared with other models, indicating that the XGBoost model was the optimal model with favorable clinical utility.

Feature importance in XGBoost models
The ranks of feature importance in the XGBoost model are shown in Fig. 5. Urine output, mechanical ventilation, BMI, eGFR, minimum creatinine, maximum PPT, and minimum BUN were the most important features that contributed to AKI in critically ill patients with sepsis.

Discussion
Compared with some previous reports on AKI prediction in critically ill patients using the MIMIC-III dataset [18][19][20], our research has several novel contributions. For the first time, our study included seven commonly used ML algorithms for comprehensive analyses and compared their predictive performance with that of traditional scoring systems, including SOFA and SAPS II scoring systems. The ML models showed good predictive accuracy in term of discrimination and calibration, but it was not the same as usefulness in clinical practice. When the threshold probabilities of the net benefit are impractical, a model with good performance may also have limited applicability [27]. Therefore, we applied the DCA curves to validate the clinical applicability of the ML models. Finally, Boruta algorithm can help us fully understand the importance of independent variables, so as to carry out feature selection more effectively. The incidence rate of sepsis is increasing in critically ill patients worldwide, which is associated with high mortality and economic burden [28]. Sepsis is a wellknown risk factor of AKI, as the kidney is very sensitive to hypoperfusion and some interventions, such as mechanical ventilation and excessive fluid resuscitation. At present, the treatment of AKI in sepsis remains reactive and nonspecific, and no preventive treatment is available. The presence of AKI has a significant impact on increased mortality in septic patients, which range from 38.2 to 70.2% [29,30]. Hernando et al. [31] found that AKI occurs in 40-50% of septic patients with a 6-eightfold increase in mortality. Furthermore, a prospective cohort study including 401 critically ill patients revealed that the incidence of AKI was 50.1% in patients with severe sepsis, which is 7.79 times higher than that in patients without sepsis [32]. However, active treatment at the early stage of AKI can improve the survival rate [33]. Some studies have found that early renal recovery in sepsis-related AKI can not only improve the survival rate, but also contribute to the later recovery of patients after discharge [34][35][36]. Unfortunately, it is difficult for clinicians to identify patients at high risk of AKI in the ICU. Therefore, developing and promoting reliable prediction models is particularly urgent for identifying these patients and providing them with timely and effective interventions to improve their prognosis.
In this study, the traditional severity scoring systems, such as SOFA and SAPS II scores, showed an unfavorable performance compared with the ML model, suggesting that they might not be effective tools for AKI prediction in critically ill patients with sepsis. Although SOFA and SAPS II scoring systems can be used to assess the risk of adverse outcomes in critically ill patients, these scores largely depended on the experience of the practitioners [37]. Moreover, these scoring systems preclude the analysis of a large number of valuable variables, resulting in a worse predictive performance than that of multivariate models [38]. Previous studies have revealed that SOFA and SAPS II scoring systems have some disadvantages, such as poor prediction performance, low sensitivity and specificity, wide fluctuation range, and cumbersome process, compared with ML models [16].
Our results showed that the XGBoost model had a better capability than the LR model for predicting AKI in septic patients. On the one hand, the LR algorithm requires researchers to manually select independent variables, cannot detect the complex nonlinear relationship and interaction between independent variable X and response value Y, and is sensitive to the multicollinearity of independent variables, which may result in an underfit and inaccurate model. On the other hand, the XGBoost model could efficiently and flexibly deal with missing data and combine weak prediction models to establish accurate prediction models. Due to its excellent precision and performance, the XGBoost algorithm is increasingly emphasized as a competitive alternative to LR analysis in predicting clinical adverse outcomes. Among all ML models, the XGBoost model performs best in AKI prediction, which were consistent with some previous studies. Liu et al. [39] demonstrated that the predictive performance of the XGBoost model superior to three other ML models, including LR, SVM, and random forest, for predicting mortality in patients with AKI. Zhu et al. [40] found that the XGBoost model outperformed the KNN, LR, decision tree, random forest, and ANN models in prediction of hospital mortality for mechanically ventilated patients. Moreover, a metaanalysis revealed that XGBoost was more effective than LR and other ML algorisms, including ANN, SVM, and Bayesian network, in the prediction of AKI [41].
This study is the first to apply ML algorithms for predicting the development of AKI during hospitalization in patients with sepsis. Through the sophisticated XGBoost model, we identified that urine output, mechanical ventilation, BMI, eGFR, minimum creatinine, maximum PPT, and minimum BUN were mostly associated with the development of AKI in patients with sepsis. Among these features, urine output was considered to be the most important indicator of AKI, which is in accordance with the KDIGO recommendations. In addition to urine output, some measures of renal function, such as eGFR, BUN, and serum creatinine, also played an important role in the prediction of kidney disease. These results have been confirmed in many clinical studies. Mertoglu et al. [42] found that serum creatinine and BUN have greater diagnostic value compared with other novel markers including myo-inositol oxygenase and cystatin C. Laranja et al. [4] revealed that septic patients with AKI had lower urine output compared with patients with AKI from other cases or chronic kidney disease. Grams et al. [43] demonstrated that low eGFR was a reliable risk factors for AKI through a meta-analysis including more than 1 million participants from eight countries. Notably, mechanical ventilation was also significantly associated with AKI in septic patients. Positive-pressure mechanical ventilation (PPV) is commonly used in critically ill patients to provide oxygenation, ventilation, and airway protection support. However, PPV has long been considered to have potentially harmful effects on the kidney [30]. This may be due to the following three reasons. First, PPV may increase intrathoracic pressure and thus reduce venous reflux, cardiac output, and renal perfusion. Second, mechanical ventilation may induce the release of some neurohormones, affect the renin-angiotensin system, and decrease renal blood flow and eGFR. Third, mechanical ventilation at any volume or pressure might create a cascade of inflammation, including multiple interleukins, tumor necrosis factor-α, and Fas ligand, that may contribute to AKI. Moreover, PTT is common indicators to judge coagulation function. Recently, a retrospective study [44] showed that more than half of patients with septic AKI had at least one abnormal coagulation index, and coagulation dysfunction may predict poor outcome of patients. BMI is a simple and useful index for obesity according to the height and weight of patients. BMI has been widely studied in patients with sepsis and AKI. Our findings showed that the AKI group had higher BMI compared with the non-AKI group. Obesity can lead to glomerular hyperperfusion and hyperfiltration, increase the hemodynamic and metabolic burden of a single glomerulus, and activate inflammation of adipocytes and oxidative stress [45], increasing the risk and progression of AKI. As these indexes can be evaluated easily at hospital admission, they can be used as convenient predictors for the development of AKI in critically ill patients with sepsis.
In this study, the in-hospital incidence of AKI in septic patients was 75.5%, which was similar to some previous studies. According to the report by Fan et al. [15], the incidence rate of AKI was 61% in 15,508 patients with sepsis. Tejera et al. [32] conducted a retrospective study in 401 critically ill patients and found that the incidence of AKI was as high as 75.3%. The pathogenesis of AKI in sepsis is complex and has not been clarified yet. Hemodynamic instability, impaired endothelial function, infiltration of inflammatory cells in the renal parenchyma, renal thrombosis, and renal tubular necrosis have been hypothesized to contribute to the development of AKI in septic patients [46]. The hyperactivation of immune response caused by sepsis is particularly important in the pathogenesis process, including the proinflammatory and antiinflammatory stages. In the proinflammatory stage, humoral and cellular immunity can cause a storm of inflammatory factors, leading to the excessive secretion of inflammatory factors (such as interleukin 1 and tumor necrosis factor-α), the activation of complement and coagulation system, the activation of hyaluronic acid and elastase, and eventually the reduction of renal blood flow and the occurrence of AKI and septic shock [6,47]. Subsequently, patients will have a compensatory anti-inflammatory response, which is an immunosuppressive state, manifested by increased secretion of cytokines (such as interleukin 10), weakened endocytosis, reduced proliferation of lymphocytes, and increased apoptosis [29]. Thus, patients with sepsis are the high-risk group for AKI during hospitalization. Once AKI occurs, the prognosis is significantly worse, and even RRT cannot improve the prognosis.
We should acknowledge some limitations of this research. First, the retrospective and observational nature of our study may lead to inevitable selection bias. Second, the data of MIMIC-III came from a single center in the United States, which may affect the extension of the prediction model to other populations. Therefore, further research with large samples and multiple centers is necessary to externally verify the application of models. Third, we used the filling method to estimate some missing data, which may lead to deviation from the true value. However, we still believe that the constructed model is helpful for clinicians to timely treat ICU patients with sepsis at high risk for developing AKI.

Conclusions
In conclusion, the ML models can be reliable tools for predicting AKI in septic patients. Among all of the predictive models, the XGBoost model is the most effective model, which may assist clinicians in tailoring precise management and implementing early interventions for septic patients at risk of AKI to reduce mortality.